A Recursive Annotation Scheme for Referential Information Status

نویسندگان

  • Arndt Riester
  • David Lorenz
  • Nina Seemann
چکیده

We provide a robust and detailed annotation scheme for information status, which is easy to use, follows a semantic rather than cognitive motivation, and achieves reasonable inter-annotator scores. Our annotation scheme is based on two main assumptions: firstly, that information status strongly depends on (in)definiteness, and secondly, that it ought to be understood as a property of referents rather than words. Therefore, our scheme banks on overt (in)definiteness marking and provides different categories for each class. Definites are grouped according to the information source by which the referent is identified. A special aspect of the scheme is that non-anaphoric expressions (e.g. names) are classified as to whether their referents are likely to be known or unknown to an expected audience. The annotation scheme provides a solution for annotating complex nominal expressions which may recursively contain embedded expressions. In annotating a corpus of German radio news bulletins, a kappa score of .66 for the full scheme was achieved, a core scheme of six top-level categories yields κ = .78.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arndt Riester and Stefan Baumann The RefLex Scheme – Annotation Guidelines

The purpose of the RefLex annotation scheme (Baumann and Riester 2012) is the two-dimensional analysis of textual or spoken corpus data with regard to referential information status (including coreference and bridging) as well as lexical information status (semantic relations). We provide some linguisticphilosophical background followed by detailed guidelines, which can be used in combination w...

متن کامل

Coreference , Lexical Givenness and Prosody in German

In this article we discuss some empirical results concerning the impact of different levels of information status (i.e. referents and words, respectively) on the prosodic realization of referential expressions in annotated corpora of read and spontaneous speech. Both at the referential and at the lexical level not only given and new but also intermediate classes of givenness/novelty have to be ...

متن کامل

Referential and Lexical Givenness:

The main objective of the paper is to show that for an adequate analysis of an item’s information status in spoken language two levels of givenness have to be investigated: a referential and a lexical level. This separation is a crucial step towards our goal to arrive at the best possible classification of nominal expressions occurring in natural discourse which reflects our understanding of al...

متن کامل

A Unified Representation For Morphological, Syntactic, Semantic, And Referential Annotations

This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper d...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010